Here is an example of possible answers to the practical on fitting the deterministic SEITL model to the Tristan da Cunha outbreak.
Each section below correspond to a section of the practical. Thus, you can have a look at our example for one section and then go back to the practical to answer the following sections.
Although our example refers to the SEITL model, the same commands work for the SEIT4L model (i.e. data(SEIT4L_deter) instead of data(SEITL_deter)).
# the fitmodel
data(SEITL_deter)
# wrapper for posterior
my_posteriorTdC <- function(theta) {
my_fitmodel <- SEITL_deter
my_init.state <- c(S = 279, E = 0, I = 2, T = 3, L = 0, Inc = 0)
# note that for the SEIT4L model there are 4 state variables for the T
# compartment my_init.state <- c('S' = 279, 'E' = 0, 'I' = 2, 'T1' = 3, 'T2'
# = 0, 'T3' = 0, 'T4' = 0, 'L' = 0, 'Inc' = 0)
return(dLogPosterior(fitmodel = my_fitmodel, theta = theta, init.state = my_init.state,
data = FluTdC1971, margLogLike = dTrajObs, log = TRUE))
}
# theta to initialise the MCMC
init.theta <- c(R0 = 2, D_lat = 2, D_inf = 2, alpha = 0.8, D_imm = 16, rho = 0.85)
# diagonal elements of the covariance matrix for the Gaussian proposal
proposal.sd <- c(R0 = 1, D_lat = 0.5, D_inf = 0.5, alpha = 0.1, D_imm = 2, rho = 0.1)
# lower and upper limits of each parameter
lower <- c(R0 = 0, D_lat = 0, D_inf = 0, alpha = 0, D_imm = 0, rho = 0)
upper <- c(R0 = Inf, D_lat = Inf, D_inf = Inf, alpha = 1, D_imm = Inf, rho = 1)
# number of iterations for the MCMC
n.iterations <- 5000
# additional parameters for the adaptive MCMC, see ?mcmcMH for more details
adapt.size.start <- 100
adapt.size.cooling <- 0.999
adapt.shape.start <- 200You can now go back to the practical and try to run MCMC with those settings.
If you didn’t manage to run MCMC, or it took too long to obtain a few thousand iterations, you can load our short run as follows:
data(mcmc_TdC_deter_shortRun)
# this should load 2 objects in your environment: mcmc_SEITL and
# mcmc_SEIT4L. Each one is a list of 3 elements returned by mcmcMH
names(mcmc_SEITL)
## [1] "trace" "acceptance.rate" "covmat.empirical"
# the trace contains 9 variables for 5000 iterations
dim(mcmc_SEITL$trace)
## [1] 5001 9
# let's have a look at it
head(mcmc_SEITL$trace)
## R0 D_lat D_inf alpha D_imm rho log.prior
## 1 2.000000 2.000000 2.0000000 0.8000000 16.00000 0.8500000 -12.81448
## 2 2.000000 2.000000 2.0000000 0.8000000 16.00000 0.8500000 -12.81448
## 3 2.078431 2.479177 1.3236209 0.7551034 15.35135 0.9016641 -12.81448
## 4 2.078431 2.479177 1.3236209 0.7551034 15.35135 0.9016641 -12.81448
## 5 2.820058 2.261659 1.5136383 0.6891596 16.45201 0.7988818 -12.81448
## 6 3.846204 2.449146 0.3711098 0.6083465 15.82348 0.6543921 -12.81448
## log.likelihood log.posterior
## 1 -445.7795 -458.5939
## 2 -445.7795 -458.5939
## 3 -418.1840 -430.9984
## 4 -418.1840 -430.9984
## 5 -391.4595 -404.2740
## 6 -283.3309 -296.1454You can now go back to the practical and analyse this trace.
Here is an example of analysis for our preliminary run:
# convert to a mcmc object for coda
library("coda")
trace <- mcmc(mcmc_SEITL$trace)
# compute the acceptance rate
1 - rejectionRate(trace)
## R0 D_lat D_inf alpha D_imm
## 0.1704 0.1704 0.1704 0.1704 0.1704
## rho log.prior log.likelihood log.posterior
## 0.1704 0.0000 0.1704 0.1704
# between 0.1 and 0.6: looks good!
# let's have a look at the traces
library("lattice") ## for the 'xyplot' command
xyplot(x = trace)Although the chain was started at a init.theta with a low posterior density, it quickly finds the region of the parameter space with high posterior density. Note also the constant trace of the log-prior since we have assumed a uniform prior.
Overall, it looks like the chain reached its target distribution after 1000 steps.
As anticipated from the trace, burning the first 1000 iterations maximizes the effective sample size (ESS).
# Let's create a new trace without the burning
trace.burn <- burnAndThin(trace, burn = 1000)
xyplot(x = trace.burn)# Let's check the ESS
effectiveSize(trace.burn)
## R0 D_lat D_inf alpha D_imm
## 172.59406 68.90601 177.05769 122.62280 101.66500
## rho log.prior log.likelihood log.posterior
## 121.77874 0.00000 139.10356 139.10356Although we have 4000 samples remaining after burning, the ESS is much smaller. This is due to autocorrelation of the chain.
The autocorrelation between samples drops substantially for a lag of 20 iterations. We can thin the trace to reduce the autocorrelation.
# Let's create a thinned trace
trace.burn.thin <- burnAndThin(trace.burn, thin = 20)
xyplot(x = trace.burn.thin)# Let's check the ESS
effectiveSize(trace.burn.thin)
## R0 D_lat D_inf alpha D_imm
## 131.26654 66.87189 191.00000 86.83456 112.46448
## rho log.prior log.likelihood log.posterior
## 96.40768 0.00000 137.34214 137.34214Although the thinned trace has 20 times less fewer than the unthinned trace, it has a similar ESS. This is because the autocorrelation has been reduced.
Let’s compare the posterior estimates of the thinned and unthinned traces.
# The unthinned trace
summary(trace.burn)
##
## Iterations = 1:4001
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 4001
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## R0 17.2400 3.86076 0.0610363 0.293873
## D_lat 1.2168 0.33682 0.0053249 0.040576
## D_inf 10.5268 2.27555 0.0359751 0.171013
## alpha 0.5394 0.03628 0.0005736 0.003276
## D_imm 8.0693 1.97092 0.0311591 0.195471
## rho 0.6998 0.04898 0.0007744 0.004439
## log.prior -12.8145 0.00000 0.0000000 0.000000
## log.likelihood -141.2841 1.68954 0.0267106 0.143252
## log.posterior -154.0985 1.68954 0.0267106 0.143252
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## R0 9.8968 14.4880 16.8702 19.586 24.8133
## D_lat 0.5709 0.9951 1.1892 1.423 1.9408
## D_inf 5.9743 8.8864 10.6053 12.226 14.4630
## alpha 0.4677 0.5150 0.5406 0.565 0.6044
## D_imm 4.4876 6.5216 7.8916 9.520 12.0038
## rho 0.6082 0.6674 0.6995 0.735 0.7919
## log.prior -12.8145 -12.8145 -12.8145 -12.814 -12.8145
## log.likelihood -145.3254 -142.3281 -141.0061 -139.928 -138.9663
## log.posterior -158.1398 -155.1425 -153.8206 -152.743 -151.7808
# The thinned trace
summary(trace.burn.thin)
##
## Iterations = 1:191
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 191
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## R0 17.3988 3.85418 0.278878 0.336399
## D_lat 1.2195 0.33718 0.024398 0.041233
## D_inf 10.6031 2.24883 0.162720 0.162720
## alpha 0.5389 0.03511 0.002541 0.003768
## D_imm 8.0310 1.98373 0.143538 0.187058
## rho 0.6987 0.04730 0.003423 0.004818
## log.prior -12.8145 0.00000 0.000000 0.000000
## log.likelihood -141.2704 1.69755 0.122830 0.144851
## log.posterior -154.0848 1.69755 0.122830 0.144851
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## R0 10.0298 14.6419 17.1295 19.6005 24.7571
## D_lat 0.5684 1.0116 1.1961 1.4486 1.9094
## D_inf 6.3118 9.1569 10.5184 12.4720 14.2501
## alpha 0.4685 0.5176 0.5402 0.5620 0.5992
## D_imm 4.4508 6.5154 7.8623 9.5462 11.7747
## rho 0.6094 0.6675 0.7011 0.7334 0.7807
## log.prior -12.8145 -12.8145 -12.8145 -12.8145 -12.8145
## log.likelihood -145.3972 -142.1858 -140.9833 -139.8503 -138.9644
## log.posterior -158.2117 -155.0003 -153.7978 -152.6648 -151.7789They are very similar. So why thin? Because autocorrelation produces clumpy samples that are unrepresentative, in the short run, of the true underlying posterior distribution. We can check this by comparing the thinned and unthinned distributions using the function plotPosteriorDensity of the fitR package:
The thinned trace shows a smoother distribution despite having less samples than the unthinned one. This because the local “bumps” of the unthinned distribution are caused by autocorrelated samples.
You can now go back to the practical and perform a similar analysis for a long-run MCMC.
Here is an example of an analysis for our long run (\(10^5\) iterations)
# load mcmc output
data(mcmc_TdC_deter_longRun)
# create mcmc objects for both traces
library("coda")
trace1 <- mcmc(mcmc_SEITL_theta1$trace)
trace2 <- mcmc(mcmc_SEITL_theta2$trace)
# combine traces as mcmc.list object
trace <- mcmc.list(list(trace1, trace2))
# let's have a look
head(trace, 3)
## [[1]]
## Markov Chain Monte Carlo (MCMC) output:
## Start = 1
## End = 4
## Thinning interval = 1
## R0 D_lat D_inf alpha D_imm rho log.prior
## 1 2.000000 2.000000 2.000000 0.800000 16.00000 0.8500000 -12.81448
## 2 2.000000 2.000000 2.000000 0.800000 16.00000 0.8500000 -12.81448
## 3 3.291068 1.927748 2.139757 0.824364 19.14906 0.8684192 -12.81448
## 4 3.644072 1.868155 1.899170 0.809881 18.47234 0.8769007 -12.81448
## log.likelihood log.posterior
## 1 -445.7795 -458.5939
## 2 -445.7795 -458.5939
## 3 -442.6925 -455.5070
## 4 -438.2018 -451.0162
##
## [[2]]
## Markov Chain Monte Carlo (MCMC) output:
## Start = 1
## End = 4
## Thinning interval = 1
## R0 D_lat D_inf alpha D_imm rho log.prior
## 1 20.00000 2.00000 2.000000 0.10000000 8.000000 0.3000000 -12.81448
## 2 20.00000 2.00000 2.000000 0.10000000 8.000000 0.3000000 -12.81448
## 3 18.71727 2.70723 1.014131 0.07367196 9.405255 0.2468698 -12.81448
## 4 18.71727 2.70723 1.014131 0.07367196 9.405255 0.2468698 -12.81448
## log.likelihood log.posterior
## 1 -321.5801 -334.3946
## 2 -321.5801 -334.3946
## 3 -320.1123 -332.9268
## 4 -320.1123 -332.9268
##
## attr(,"class")
## [1] "mcmc.list"
# acceptance rate
1 - rejectionRate(trace)
## R0 D_lat D_inf alpha D_imm
## 0.20254 0.20254 0.20254 0.20254 0.20254
## rho log.prior log.likelihood log.posterior
## 0.20254 0.00000 0.20254 0.20254
# close to the optimal value of 0.234
# ESS
effectiveSize(trace)
## R0 D_lat D_inf alpha D_imm
## 5437.065 6163.158 6093.018 7106.347 6806.269
## rho log.prior log.likelihood log.posterior
## 7453.447 0.000 5458.377 5458.377
# plot the traces
library("lattice") ## for the 'xyplot' command
xyplot(trace) Note that the acceptance rate and the ESS are computed for the combined chain whereas the traces are plotted for each chain. Also, given the very high ESS we can reasonably choose a burn-in visually, say 5000 iterations.
trace.burn <- burnAndThin(trace, burn = 5000)
# removing the burning increases the ESS
effectiveSize(trace.burn)
## R0 D_lat D_inf alpha D_imm
## 5858.275 6107.597 6535.558 6887.668 6543.338
## rho log.prior log.likelihood log.posterior
## 7163.971 0.000 5612.370 5612.370
# autocorrelation
acfplot(trace.burn, lag.max = 60)Again, given the very high ESS, we can be quite generous in our choice of the thinning.
# Thinning: let's keep 1 iteration every 40
trace.burn.thin <- burnAndThin(trace.burn, thin = 40)
xyplot(trace.burn.thin) However, let’s compare the thinned and unthinnned distributions.
# Note that plotPosteriorDensity can take a list of mcmc.list It will plot
# the different mcmc.list by combining their elements Let's plot the
# combined unthinned trace vs the combined thinned trace.
plotPosteriorDensity(list(unthinned = trace.burn, thinned = trace.burn.thin))In contrast to the previous short-run, they are almost no difference between the thinned and unthinned chains. Indeed, with such a long chain, the clumpy autocorrelation has been averaged out!
In fact, there are several references that show that the longer (unthinned) chain usually yields better estimates of the true posterior than the shorter thinned chain, even for percentiles in the tail of the distribution. That said, thinning can be useful for other reasons, such as memory or time constraints in post-chain processing.
Now, we can compare whether the two independent chains, started at theta1 and theta2, have converged to the same posterior distribution
Since the chains have converged to the same posterior, we can use the combined estimates
# the function summary combines the chains of a mcmc.list
summary(trace.burn.thin)
##
## Iterations = 1:2318
## Thinning interval = 1
## Number of chains = 2
## Sample size per chain = 2318
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## R0 17.0909 4.30760 0.0632650 0.0721890
## D_lat 1.1458 0.35552 0.0052214 0.0057899
## D_inf 10.7516 2.23701 0.0328546 0.0349587
## alpha 0.5373 0.03823 0.0005615 0.0006162
## D_imm 7.7800 2.11592 0.0310762 0.0315506
## rho 0.6957 0.05189 0.0007621 0.0007799
## log.prior -12.8145 0.00000 0.0000000 0.0000000
## log.likelihood -141.2702 1.71514 0.0251900 0.0282695
## log.posterior -154.0846 1.71514 0.0251900 0.0282695
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## R0 9.9720 13.9366 16.6412 19.9047 26.4282
## D_lat 0.4806 0.8984 1.1447 1.3817 1.8422
## D_inf 6.4954 9.1177 10.7736 12.4811 14.6550
## alpha 0.4588 0.5122 0.5386 0.5633 0.6098
## D_imm 4.1559 6.3534 7.6108 9.0465 12.5091
## rho 0.5987 0.6601 0.6942 0.7295 0.8023
## log.prior -12.8145 -12.8145 -12.8145 -12.8145 -12.8145
## log.likelihood -145.5656 -142.1288 -140.9693 -140.0230 -138.8926
## log.posterior -158.3801 -154.9433 -153.7838 -152.8375 -151.7071Running several independent chains starting from different parts of the parameter space allows us to check whether the posterior distribution is multi-modal. If so, then we must be careful when combining the chains. For instance, an estimate of the mean computed with summary won’t be meaningful for a parameter with a multi-modal posterior.
By contrast, for a unimodal posteriors, combining chains is an efficient way to increase the ESS and the precision of the posterior estimates. Furthermore, running several “shorter” chains in parallel is faster than running one “long” chain.
Finally, let’s assess the fit of the deterministic SEITL model.
# load data
data(FluTdC1971)
# the same init.state as for the fit
init.state <- c(S = 279, E = 0, I = 2, T = 3, L = 0, Inc = 0)
# by default plotPosteriorFit summarize the fit of 100 thetas sampled from
# the posterior
plotPosteriorFit(trace = trace, fitmodel = SEITL_deter, init.state = init.state,
data = FluTdC1971)
# alternatively, one can plot the fit of the mean of the posterior (in this
# case the observation is replicated 100 times)
plotPosteriorFit(trace = trace, fitmodel = SEITL_deter, init.state = init.state,
data = FluTdC1971, posterior.summary = "mean")
# or using the maximum a posteriori (MAP) estimate
plotPosteriorFit(trace = trace, fitmodel = SEITL_deter, init.state = init.state,
data = FluTdC1971, posterior.summary = "max")Note that the 95% credible intervals (CI) for the posterior fit under the MAP captures the highest data point. By contrast, the fit of the second peak seems quite poor, even for the MAP.
You can now go back to the practical and look at the posterior correlations between the parameters.
The correlation of the posterior distribution can be investigated using levelplot.
library("lattice") ## for the 'levelplot command
# levelplot doesn't accept `mcmc.list`, we pass the first `mcmc` only.
levelplot(trace.burn.thin[[1]], col.regions = heat.colors(100))Note the strong positive correlations (~0.8) between \(R_0\) and \(D_{lat}\) and between \(R_0\) and \(D_{inf}\). In order to explain the wide 95% CIs of \(R_0\) and \(D_{inf}\), let’s have a look at the contact rate \(\beta = R_0/D_{inf}\).
with(as.data.frame(trace.burn.thin[[1]]), quantile(R0/D_inf, probs = c(0.025,
0.25, 0.5, 0.75, 0.975)))
## 2.5% 25% 50% 75% 97.5%
## 1.091157 1.395928 1.573898 1.771753 2.203119The posterior value of \(\beta\) varies somewhat less than the posterior values of \(R_0\) and \(D_\mathrm{inf}\). Indeed, this parameter is constrained by the shape of the initial phase of the outbreak. Conversely, there are an infinite number of combinations of \(R_0\) and \(D_{inf}\) that lead to the same \(\beta\), hence their wide 95% CIs.
A second effect that could explain the wide posterior density in \(R_0\) is the very high attack rate. Indeed, once \(R_0>5\) it doesn’t make much difference whether \(R_0\) is equal to, say, 10 or 20.
We can also note that the posterior estimate of \(D_{inf} = 11\) days (95% CI: \([6-15]\)) is biologically unrealistic based on previous empirical estimates. However, our approach did not include any prior information as the default SEITL_deter fitmodel comes with uniform priors for all parameters.
In order to include previous empirical information on \(D_{lat}\) and \(D_{inf}\), let’s modify the dprior function of SEITL_deter as follows:
SEITL_deter$dprior <- function(theta, log = FALSE) {
# package with truncated normal distribution
library(truncnorm)
log.prior.R0 <- dunif(theta[["R0"]], min = 1, max = 50, log = TRUE)
# normal distribution with mean = 2 and sd = 1 and truncated at 0
log.prior.latent.period <- log(dtruncnorm(theta[["D_lat"]], a = 0, b = Inf,
mean = 2, sd = 1))
# normal distribution with mean = 2 and sd = 1 and truncated at 0
log.prior.infectious.period <- log(dtruncnorm(theta[["D_inf"]], a = 0, b = Inf,
mean = 2, sd = 1))
log.prior.temporary.immune.period <- dunif(theta[["D_imm"]], min = 0, max = 50,
log = TRUE)
log.prior.probability.long.term.immunity <- dunif(theta[["alpha"]], min = 0,
max = 1, log = TRUE)
log.prior.reporting.rate <- dunif(theta[["rho"]], min = 0, max = 1, log = TRUE)
log.sum <- log.prior.R0 + log.prior.latent.period + log.prior.infectious.period +
log.prior.temporary.immune.period + log.prior.probability.long.term.immunity +
log.prior.reporting.rate
return(ifelse(log, log.sum, exp(log.sum)))
}Note the choice of a truncated normal distribution since \(D_{lat}\) and \(D_{inf}\) must be positive.
You can now go back to the practical and run a MCMC with this informative prior.
Here we combine both chains with informative priors and compare the posterior distribution with the one above.
library("coda")
# create mcmc object
trace.info1 <- mcmc(mcmc_SEITL_infoPrior_theta1$trace)
trace.info2 <- mcmc(mcmc_SEITL_infoPrior_theta2$trace)
# combine in a mcmc.list
trace.info <- mcmc.list(trace.info1, trace.info2)
# burn and thin as the chain with uniform prior (see above sections)
trace.info.burn.thin <- burnAndThin(trace.info, burn = 5000, thin = 40)
# check that both chains converged to the same posterior
plotPosteriorDensity(trace.info.burn.thin)# compare the effect of informative priors on the posterior distribution
plotPosteriorDensity(list(unif = trace.burn.thin, info = trace.info.burn.thin))\(R_0\) and \(D_{inf}\) have very different posterior. This is expected since there is an informative prior on \(D_{inf}\) and \(R_0\) is strongly correlated to \(D_{inf}\). Note also that the mode of all other parameters have changed, though less than \(D_{inf}\) and \(R_0\). This illustrate the influence that one prior can have on the full posterior distribution.
You can now go back to the practical.
# combine the two chains in a data frame
library("plyr") # needed for the 'ldply' function used in the next line
trace.combined <- ldply(trace.info.burn.thin)
# take the mean of theta
theta.bar <- colMeans(trace.combined[SEITL_deter$theta.names])
print(theta.bar)
## R0 D_lat D_inf alpha D_imm rho
## 7.6250928 1.2904394 3.6664971 0.4751491 9.1354413 0.6468514
# compute its log-likelihood
init.state <- c(S = 279, E = 0, I = 2, T = 3, L = 0, Inc = 0)
log.like.theta.bar <- dTrajObs(SEITL_deter, theta.bar, init.state, data = FluTdC1971,
log = TRUE)
print(log.like.theta.bar)
## [1] -142.84
# and its deviance
D.theta.bar <- -2 * log.like.theta.bar
print(D.theta.bar)
## [1] 285.6799
# the effective number of parameters
p.D <- var(-2 * trace.combined$log.likelihood)/2
print(p.D)
## [1] 8.116991
# and finally the DIC
DIC <- D.theta.bar + 2 * p.D
print(DIC)
## [1] 301.9139Follow this link to go back to the practical.